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Abstract — We establish an upper bound on the noncoherent ca- 
pacity pre-log of temporally correlated block-fading single-input 
multiple-output (SIMO) channels. The upper bound matches the 
lower bound recently reported in Riegler et al. (2011), and, hence, 
yields a complete characterization of the SIMO noncoherent ca- 
pacity pre-log, provided that the channel covariance matrix satis- 
fies a mild technical condition. This result allows one to determine 
the optimal number of receive antennas to be used to maximize the 
capacity pre-log for a given block-length and a given rank of the 
channel covariance matrix. 

I. Introduction 

A crucial step in the design of wireless communication sys- 
tems operating over fading channels is to determine the optimal 
amount of resources to be used for channel estimation. A fruitful 
approach to address this problem in a fundamental fashion is to 
characterize the channel capacity pre-log (i.e., the asymptotic 
ratio between capacity and the logarithm of the signal-to-noise 
ratio (SNR) as SNR goes to infinity) in the noncoherent setting 
where neither transmitter nor receiver are aware of the realization 
of the fading process, but both know its statistics perfectly^] 
While a capacity pre-log characterization for single-input single- 
output (SISO) systems is available for several fading models 
of practical interest [1 1-|4 |, the multiple-input multiple-output 
(MIMO) case is still largely open. 

The impact of multiple antennas on the capacity pre-log 
has been characterized in [5 1 for the Rayleigh-fading constant 
block-fading model. According to this model, the channel stays 
constant over a block of N channel uses and changes in an inde- 
pendent fashion from block to block. The approach used in |5 ] to 
characterize the capacity pre-log is based on an apposite change 
of variables, which reveals the geometry in the problem. One 
interesting consequence of the analysis in |5| is that the SISO 
capacity pre-log of constant block-fading channels coincides 
with the single-input multiple-output (SIMO) capacity pre-log. 
Hence, using multiple antennas at the receiver only does not 
yield a larger capacity pre-log. 

A more accurate yet simple way to capture channel variations 
in time is to assume that the channel is correlated (but not neces- 
sarily constant) in each block, with the rank of the corresponding 
NxN correlation matrix given by Q. We shall refer to this model 
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as correlated block-fading. For this channel model, the SISO 
capacity pre-log was determined in [3 1, whereas the MIMO case 
is still open. A lower bound on the SIMO capacity pre-log was 
recently reported in [6] and refined in [7|. The results in |6j, 
|7j are surprising, as they imply that, when Q > 1, the SIMO 
pre-log can be larger than the SISO pre-log. 

Contributions: In this paper, we provide an upper bound on 
the SIMO capacity pre-log that matches the lower bound reported 
in (7). Hence, the SIMO capacity pre-log is fully characterized. 
Our result allows us to establish that the optimal number of 
receive antennas to be used to maximize the capacity pre-log 
for a given block-length N and rank Q < N of the channel 
correlation matrix is \(N - 1)/{N - Q)]. 

For the constant block-fading case, we provide an alternative 
and much simpler derivation of the SIMO capacity pre-log than 
the one provided in [5 1. Our proof is based on duality 0} and 
fully exploits the geometry in the problem unveiled in [5 1. 

Notation: Uppercase boldface letters denote matrices and 
lowercase boldface letters designate vectors. The superscripts T 
and H stand for ttansposition and Hermitian transposition, re- 
spectively. For a matrix A 6 C mx ", we write a,; for its ith 
column, tr{ A} for its trace, and Oi (A) for its ith largest singular 
value. For a vector a, diagja} denotes the diagonal matrix that 
has the entries of a on its main diagonal and <jj denotes the ith 
entry of a. We use a combination of superscripts and subscripts to 
indicate sequences of random variables or vectors. For example, 
aj^ denotes the sequence of random vectors a m , a m+ i . . . , a n . 
We use |I| to denote the cardinality of the set I. We denote 
expectation by E[-] and use the notation E x [-] or Eq[-] to stress 
that expectation is taken with respect to x with probability 
distribution Q. The relative entropy between two probability dis- 
tributions Q and Ris denoted by D(Q|| R). For two functions f(x) 
and g(x), the notation f(x) = 0(g(x)), noo, means that 
limsup;,,.,.^ |/(cc)/(7(a;)| < oo, and f(x) — o(g{x)), x — > oo, 
means that lim^oo | f(x) /g(x) | = 0. For two random matrices 

A and B, we write A = B to indicate that A and B have the 
same distribution. Finally, CJ\f(0, R) stands for the distribution 
of a circularly-symmetric complex Gaussian random vector with 
covariance matrix R. 

II. System Model 

We consider a Rayleigh-fading correlated block-fading SIMO 
channel with block-length N and M receive antennas. The main 



feature of the correlated block-fading model is that the fading 
in each component channel between the transmit antenna and 
each receive antenna is independent across blocks of N channel 
uses, but is correlated within each block, with the rank of the 
corresponding covariance matrix given by Q < N. We shall also 
assume that the fading is independent and identically distributed 
(i.i.d.) across component channels. The input-output (10) relation 
within a block of N channel uses can be conveniently expressed 
in matrix form as follows: 

Y = SP T diagjx} + W. (1) 

Here, x e contains the input symbols transmitted within 
the block. We assume that x is subject to the following average- 
power constraint: 

E[\\ X \\ 2 ] <Np. (2) 

The whitened fading matrix S is of size M x Q and has i.i.d. 
CAf(0, 1) entries. The N x Q matrix P, which is deterministic 
and of full rank Q < N, describes the correlation structure 
within a block. We shall assume that the rows of P have unit 
norm, and, hence, that the entries of the matrix SP T are identi- 
cally distributed. Finally, the M x N Gaussian noise matrix W 
has i.i.d. £/V(0, 1) entries, and the M x N matrix Y collects the 
signals from the M receive antennas during N channel uses. The 
model just described is of practical relevance, because it captures 
channel variation in time in an accurate but simple way: large Q 
corresponds to fast channel variation. Furthermore, ([TJi models 
accurately the IO relation in the frequency domain of a cyclic- 
prefix orthogonal frequency-division multiplexing system that 
operates over a multipath channel with Q uncorrelated taps. Note 
that, when Q = 1, the correlated block-fading model reduces to 
the constant block-fading model. 

The capacity of the channel ([1} is given by 

C(p) ± ^sup/(x;Y) 

^ Q 

where /(x; Y) denotes the mutual information between x and 
Y in (fill, and the supremum is over all probability distributions 
Q on x that satisfy ([2j. As the noise has unit variance, p denotes 
the SNR. The capacity pre-log \ is defined as 

X = lim C(,o) /log p. 

p— > oo 

III. Known Results 

In the noncoherent setting where the realizations of the fading 
process S are not known to transmitter and receiver (but P and 
the statistics of S are perfectly known), an analytic characteriza- 
tion of C(p) is not available. As we shall review next, pre-log 
expressions are available for some values of N, Q, and M. 

For the SISO case (M = 1), Liang and Veeravalli |3 ] proved 
that the pre-log is equal to 1 — Q/N. This result can be inter- 
preted as follows: channel uncertainty yields a penalty of Q/N 
compared to the case when the channel is perfectly known to 
the receiver (in this case, capacity grows logarithmically with 
SNR and the capacity pre-log is one [8]). Alternatively, we 
can interpret Q/N as the fraction of channel uses in which 
pilot symbols need to be transmitted to learn the channel at 



the receiver |9j. When Q — N, learning the channel requires 
to transmit pilot symbols in each channel use; hence, x = 0. 
In this case, capacity turns out to grow double-logarithmically 
with SNR, independently of the number of receive antennas (4] 
Thm. 4.2]. 

For the special case Q = 1 (i.e., constant block-fading), the 
SISO capacity can actually be characterized up to a o(l) term [2|, 
(5J (see |5| for a simple proof). For the SIMO case, such a 
characterization is available only when N > M + 1 (5] Lem. 13]. 
However, a pre-log characterization is available for all block- 
length values N. In particular, it follows from (5] Eq. (27)] that 
the SIMO capacity pre-log for the Q = lease is equal to 1 — 1 /N, 
i.e., it coincides with the SISO capacity pre-log. This result 
implies that, when Q = 1, using multiple antennas at the receiver 
only is not beneficial from a pre-log point of view. 

This statement turns out to be no longer valid when Q > 1. 
More precisely, the following result was recently proven in (7): 

Theorem 1 ( ^ Thm. 1 ]): Suppose that P in ([TJ satisfies the 
following Property (A): There exists a subset of indices JC C 
{1, . . . , N} with cardinality 

\JC\ 4 wm{\{QM - 1)/ (M — 1)] , N) 

such that every Q row vectors of the submatrix of P obtained by 
retaining the rows in P with indices in JC are linearly independent. 
Then the pre-log of the channel ([TJ is lower-bounded as 

X >min{A/(l-Q/A),l-l/A}. 

Theorem [TJ implies that the pre-log penalty of Q/N incurred 
in the SISO case by not knowing the channel at the receiver 
can be reduced to 1 /N by deploying multiple antennas at the 
receiver side, as long as the block-length is sufficiently large 
and P satisfies Property (A). In other words, one pilot symbol 
per block suffices to learn the channel at the receiver. Intuitively, 
Property (A) ensures that one can recover both S and N — 1 
entries of x from the noiseless receive signal SP T diagjx}, 
once one entry of x is fixed (7) . 

IV. A Matching Pre-Log Upper Bound 

The main result of this paper is the following theorem: 
Theorem 2: The capacity pre-log of the channel ([TJi is upper- 
bounded by 

X < min {M (1 - Q/N) , 1 - l/N} . (3) 

Remarks: Theorem [2] combined with Theorem [TJ yields 
a complete characterization of the SIMO capacity pre-log for 
the case when P satisfies Property (A). The SIMO capacity 
pre-log is given by the minimum between the number of receive 
antennas M times the SISO capacity pre-log of a rank-Q channel, 
and the SISO capacity pre-log of a rank-1 channel. Note that 
the pre-log upper bound in ([3) holds independently of whether 
P satisfies Property (A) or not. We expect the upper bound to 
be loose if Property (A) is not satisfied. Assume now that every 
QxQ submatrix of P has full rank (a condition slightly stronger 
than Property (A)). Then, |3j implies that the optimal number 
of receive antennas to be used to maximize the capacity pre-log 
for a given block-length N and rank Q < N of the channel 
correlation matrix is \(N — 1)/(N- Q)] . 



Outline of the proof: The proof consists of two parts. We 
first prove that % < M(l - Q/N) by generalizing to the SIMO 
case the approach used in [3 Prop. 4] to establish a tight upper 
bound on the SISO capacity pre-log. Then, we prove that \ < 
1 — 1/N by showing that the capacity of a rank-Q channel with 
M receive antennas can be upper-bounded by the capacity of a 
rank-1 channel with MQ receive antennas. The desired result 
then follows by [5, Eq. (27)]. As the proof of (5] Eq. (27)] is 
rather involved, we provide an alternative, much simpler proof 
of this result (for the SIMO case) in Section fV^A] 

V. Proof of Theorem|2] 

First Part: \ < M(l — Q/N): Without loss of generality, 
we assume that the first Q rows of P are linearly independent. 
This can always be achieved by rearranging the columns of Y 
in nj. We start by manipulating J(x; Y) as follows (we use the 
notation convention introduced in Section|I|: 

/(x;Y)=7«;yn 



(a) 



I(x?;y?)+I(x?;y% +1 \y?) 

/(z?;y?) + /«;y£ +1 |y?). 



(b) 



(4) 



Here, in (a) we used chain rule for mutual information and (b) 
follows because y^p and Xq +1 are conditionally independent 
given xf. We next upper-bound each term on the right-hand 
side (RHS) of Q separately. The assumption that the first Q 
rows of P are linearly independent implies that the first term on 
the RHS of Q grows at most double-logarithmically with SNR. 
More precisely, we have that [4, Thm. 4.2]: 

/(a:?;y?)<loglogp + 0(l), p^oo. (5) 

For the second term on the RHS of (Hh, we proceed as follows: 



^i v ;yQ + i|yr) = MyQ + ilyr)-My 
< Myg+i) - h (yg+i I y? 



N 



|y?, 



,S) 



Q+H 



N 



(b) „ 

< h(Vk)+0(l), p^rc 

fe=Q+l 



i \ N 

< E 

k=Q+l 
N 



M 



log( 



1 +E 



\Xk\ 



•0(1), 



(d) 

< 



k=Q+l 



M\og(l + Np)+0{l) 



M(N -Q) log P + 0(l) 



oo. 



(6) 



Here, in (a) we used that conditioning reduces entropy; (b) fol- 
lows by chain rule for differential entropy and because condition- 
ing reduces entropy; (c) follows because jointly proper Gaussian 
random vectors are entropy-maximizers for a fixed covariance 
matrix and because E [y^y^] = (1 + E [|^fc| 2 ] )Im (recall that 
we assumed that the rows of P have unit norm); finally, in (d) 
we used the average-power constraint The desired upper 
bound on the capacity pre-log follows by substituting |5]) and |6]) 
into fill. 



Second part: \ < 1 — 1 /N: We show that the capacity of 
a rank-Q channel with M receive antennas is upper-bounded by 
the capacity of a rank-1 channel with QM receive antennas. By 
simple matrix manipulations, we can rewrite the 10 relation ([l} 
in the following more convenient form: 



w. 



Let now Wi , • • • , Wq be M x N independent random matrices 
with i.i.d. CAT(0, 1) entries. As, by assumption, the rows of P 
have unit norm, we have that 

Q 

wi^W ? diag{p,}, 

9=1 

Hence, we can rewrite Y as 

Q 

Yi^Y ? diag{pJ 

9=1 



where 



Y 9 ^s,x T + W 9 . 



Note now that each Y 9 is the output of a rank-1 SIMO channel 
with M receive antennas. By observing that x and Y are con- 
ditionally independent given {Yi, • • • , Yq}, we conclude that, 
by the data-processing inequality [10, Sec. 2.8], 



■Q) 



/(x;Y)</(x;Yi,. 



The claim follows by noting that the (QM) x N matrix obtained 
by stacking the matrices Y q on top of each others is the output of 
a rank-1 SIMO channel with QM receive antennas. As reviewed 



in Section III the SIMO capacity pre-log for the rank-1 case 
coincides with the SISO capacity pre-log and is given by 1 — 1/N. 
This result follows from [4, Thm. 4.2], for the case N = 1, and 
from(5]Eq. (27)], for the case N > 1. This concludes the proof. 

For completeness, in Lemma[3]below we restate |5] Eq. (27)] 
for the SIMO case, and provide an alternative, much simpler 
proof of this result in Section [V-A| below. 

Lemma 3: The capacity of the SIMO channel ([TJ with M 
receive antennas, Q = 1, and N > 2 is given by 

C(p) = (l-l/N)logp + 0{l), p^oo. (7) 

A. Proof of Lemma^ 

1) Geometric Intuition: When Q — 1, we can rewrite the 10 
relation as 

Y = sx T + W 

where s ~ CAf(0, Ijwr). We next provide a geometric argument 
illustrating why the SIMO capacity pre-log coincides with the 
SISO capacity pre-log when Q = 1. A similar argument can be 
found in |5J. Let x be an arbitrary vector in C N . In the absence of 
noise, the rows of Y are collinear with x. The only information 
the receiver can recover (in the absence of noise) about the 
transmit vector x from any of these rows is the line on which x 
lies. A line in C N is characterized by N — 1 complex parameters. 
Hence, as argued in [9], the receive signal Y carries N — 1 



parameters describing x. This number, divided by N, coincides 
with the capacity pre-log we want to establish. As one column 
of Y is sufficient to recover the N — 1 parameters describing 
the line on which x lies, adding more receive antennas does not 
appear to be beneficial. We next prove this result by sandwiching 
capacity between a lower bound and an upper bound that are 
tight at high SNR. 

2) A Capacity Lower Bound: The RHS of (|7]i is a lower- 
bound on capacity. This result follows directly from j3] Prop. 7]. 

3) A Matching Upper Bound Through Duality: Establishing 
an asymptotically tight capacity upper bound is more involved. 
Our proof is based on duality Q, a technique that allows us to 
obtain a tight upper bound on J(x; Y) by carefully choosing a 
probability distribution on Y. More precisely, let W(- | x) denote 
the conditional distribution of Y given x, and let QW denote the 
distribution induced on Y by the input distribution Q and by the 
channel W(- | x). Finally, let R be an arbitrary distribution on 
Y with probability density function (pdf) r(Y). We use duality 
to upper-bound the mutual information 7(x; Y) as follows |4] 
Thm. 5.1]: 



7(x;Y)<E Q [D(W(.|x)||R(.))] 

= -E QW [logr(Y)]-/ l (Y|x). 



(8) 



To get a tight capacity upper bound, the output distribution R 
must be chosen appropriately. For the SISO case, this choice can 



be motivated as follows: the geometry unveiled in Section V-Al 



suggests to use the subspace spanned by x to convey information. 
This can be achieved by choosing an input distribution that is 
uniformly distributed on the sphere in with radius \/Np. 
The output distribution induced by this input distribution in the 
absence of additive noise turns out to yield a tight capacity upper 
bound, as shown in J5) . 

Generalizing this approach to the SIMO case is not straight- 
forward. The reason is as follows: for any choice of the input 
distribution, the matrix sx T has rank at most 1, whereas the 
additive noise matrix W has full rank with probability one. This 
implies that, independently of the choice of the input distribution, 
the induced output distribution in the absence of additive noise is 
not absolutely continuous 1 1 1, Def. 6.7] with respect to W(- | x), 
and, hence, the RHS of |8]l diverges. To get a tight bound, one 
needs to choose an output distribution for which Y has full rank 
with probability one. This implies that, differently from the SISO 
case, the additive noise needs to be accounted for in the choice 
of the output distribution. 

To shed light on how this can be done, it is convenient to 
express Y in terms of its singular-value decomposition (SVD). 
More specifically, let P = min{M, N} and L = max{M, iV}; 
then Y can be written as Y = USV ff , where U e C MxP 
and V £ (£NxP are (truncated) unitary matrices, and £ = 
diag{ [<7i ( Y) • • • CTp(Y)]} contains the singular values of Y 
in descending order. To make the SVD unique, we assume 
that the first row of U is real and non-negative. We shall take 
an output distribution for which <Ti(Y) is distributed as the 
nonzero singular value of the noiseless receive matrix sx T and 
the remaining singular values are distributed as the ordered 



singular values of a (M — 1) x [N — 1) random matrix with 
i.i.d. CAT(0, 1) entries. More specifically, we tak^] 

r(<7i,--- ,(T P ) = r((Ti) ■r(cr 2 ,-- - ,op) 

where 

r(£7l) = _^l_. e -^/(MiVp) ) ai>0 



MNp 



and(T2jThm. 2.17] 
r(a 2 , ■■■ ,<j P ) = 



P 2(L-P)+1 

11 T,-i 



VL(L-i)\{P-i)\ 



°2, 



. ,a P > 0. 



p-i p 

n n 

i = 2 j = i+l 

Finally, we take V and U independent of the singular values and 
uniformly distributed (with respect to the Haar measure) on the 
Stiefel manifolq^S (N, P), and on the submanifold of S(M, P) 
induced by the nonnegativity of the first row of U, respectively. 
We next evaluate the RHS of <|8j for the resulting output pdf, 
which we (still) denote by r(Y). The conditional differential 
entropy h(Y | x) in ^ can be easily computed: 

/i(Y|x) =ME x [log(||x|| 2 + l)] +MiVlog(7re). (9) 

To evaluate the first term on the RHS of (jHJ, it is convenient to 
express r(Y) in the SVD coordinate system. By the change of 
variables theorem fTTj Thm. 7.26], we get 

-E QW [logr(Y)] = -E QW [logr(U,E,V)] 

+ E Q w[logJM,JvOi,--- ,0-p)] (10) 

where Jm,n{ci 1 • • • , crp) is the Jacobian of the SVD, which is 
given by pi App. A] 



Jm,n(<?i, ■ ■ ',0p) = Y\_ a 



p p-i p 

2(L-P) + 1 



,2 

3 / 



n nw 

z—l i—1 j=i-\-l 

By construction, we have that 

- E QW [log r(U, S, V)] = - E QW [log r(U)] - E QW [log r(V)] 

s v ' 

=0(1), p-S-oo 

- E QW [log r(ai)] - E QW [log r(<7 2 , ■ • ■ , crp)} 

= log/9-E Q w[logcri] 

p 



E Q w[ct 2 ] /(MNp)+E, 



QW 



- c i(p) 

J2 E EpwtlogK 2 -^ 2 ) 2 ] 

i=2 j=i+l 
P 



E E QW logcr 



„ 2(L-P) + 1 



+ 0(1), p^oo. (11) 



2 We shall indicate (Tj(Y) simply as <jj whenever no ambiguity occurs. 

3 The set of complex m X n (n > m) unitary matrices form a man ifold 
S(n, m) of 2mn — m? real dimensions, called the Stiefel manifold |l3| , 
This manifold has volume |S(n, m)\ = Yl?= n ~m+i 27r l /(i — 1)!. 



The expectation of the Jacobian in ( p~0] > can be rewritten as 

EQ W [log Jm,n((?i, ■ ■ ■ 



2(L-P) + 1 



J=2 
2(L-P)+1 



EqW log CTj 

P 
i=2 

+ EE EQw[logK 2 -^) 2 ] 

i=2 j'=i+l 



^E QW [bg((7? - a]) 2 



<log a\ 



(12) 



Substituting (111 and ( 12 1 into ( 10 1, we obtain 



- Eqw[r(Y)] < log p+ (N + M — 2) E QW [log a 2 ] 

+ Cl (p) + 0(1), p^^. (13) 

Finally, substituting ( fT3| l and |9) into ((8), we get 

Z(x; Y) < log p + (N - 2) E QW [log a 2 ] + Cl (p) 
+M(E QW [lo g( T 2 ] -E x [log(||x|| 2 + l)])+0(l), p^oc. 

" v ' 

-ca(f>) 

We conclude the proof by showing that, Eqw [log c 2 ] < log p + 
0(1), p — > oo and that Ci(p) and C2(p) can be upper-bounded 
by finite constants. For the first term, we have that 



(a) 



JV 



EQwfloga 2 ] <E QW [logtr{Y ff Y}] < log ^ E QW [||yd| 2 ] 



< logp + 0(l), 



(14) 



Here, in (a) we used Jensen's inequality and (b) follows from Q. 
To show that c\ (p) and c-i (p) are bounded, the following lemma 
will turn out to be useful. 

Lemma 4 ( fiF^ Sec. 7.3]): Let A, B G C mxn and p = 
min{m, n}. Then 

<7 i+3 _i(A+B) < o- i (A)+CT J -(B), 1 < i,j <p, i+j < p+1. 

If we choose A = sx T and B = W, we obtain from Lemma|4] 
that 

'||s||||x||+<7i(W), i = l 

2 < i < P. 



<n(Y) < 



^i-i(W), 
By using (15) , it follows that 



(15) 



E, 



QW 



E^( Y ) 



< E 



w 



p-i 



< MN. 



This inequality, together with the inequality 

Eqw[ct 2 ] < MN(p+l) 

which can be established using similar steps to the ones leading 
to ( |14) , are sufficient to conclude that ci (p) is bounded. 



To establish that C2(p) is bounded, we start by noting that 
the first term in the expression that defines C2 (p) can be upper- 
bounded as follows: 

E QW [loga 2 (Y)] < 2E QW [log(||s||||x|| + *t(W))] 



(6) 
< 



2E x [log(E s , w [||s||||x|| +ai(W)])] 



(c) 

< E„ 



log(VM(||x|| +VN)) X 



Here, (a) follows from (fT3J, (b) holds because of Jensen's 
inequality, and in (c) we used that E[||s||] < \f~M and that 



(E[eri(W)]) 2 < E (cti(W)) 2 < E[tr{W ff W}] 
= MN. 



Hence, 



c 2 (p) < E x 



log 



< sup< log 



Xj| 2 + 1 

M(||x|| + VN)) 



Ix|l 2 + 1 



= log[M(N + 1)}. 



This concludes the proof. 
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